Each panel walks one path of code from a user program down to the hardware and back. Every step shows where control sits, what the kernel is doing, and where the bytes actually move.
How one call walks through the operating system
Every path below starts in user code and follows the same lane: the C library, the system call, the kernel, the driver, the device, and the wake-up that hands control back to the calling program. The numbered steps stay in lock-step with the diagram so the logic reads top to bottom.
Why one load can be 100 times slower than another
Each scenario follows a single load or store from the CPU down through the caches until main memory actually sees it. The hit, miss, and shared-line cases show why the same line of C can behave very differently at runtime.
From a line of C to one instruction moving through the CPU
Four parts in order: a C snippet lowered to machine code, one instruction walking the pipeline, the block-level shape of a modern out-of-order core, and a small register and mnemonic cheat sheet. The point is to see, end to end, what a piece of C actually runs as.
How double buffering keeps video from tearing
Each path shows who owns each buffer at every moment: the camera is filling one while the display is reading the other, and a flip swaps them at the next refresh. The same idea applies to any double-buffered renderer.
A thread sleeps until another thread wakes it
A condition variable is how one thread waits for shared state to change without burning the CPU. pthread_cond_wait drops the mutex and goes to sleep in one atomic step, then re-takes the mutex before it returns. This panel shows why both halves have to be atomic.
while(!pred), because wakes can be spurious or consumed by another waiter first.One store in a driver turns on an LED
A single write to a memory-mapped register lights a pin. The diagram follows that one instruction from the CPU, through the address translation, across the interconnect, into the device's register decoder, and finally to the physical output. It is the shortest possible path from C to hardware.
Device-nGnRnE memory forbids merging, reordering, and early acknowledgment, so MMIO writes complete in issue order. DMB OSHST drains the store before the next instruction observes the LED. If the CPU MMU has no mapping for the GPIO region the store takes a translation fault and the LED stays in its previous state.Virtual memory, twice: how a hypervisor maps a guest
On bare metal an address goes through one page-table walk. Under a hypervisor every guest address is translated twice: once by the guest kernel and again by the host. Each path below walks through both stages and shows where a fault could land.
TTBR0_EL1 (Translation Table Base Register 0, EL1) to walk guest page tables (VA → IPA, or VA → PA on bare metal). Stage-2 uses VTTBR_EL2 (Virtualization Translation Table Base Register, EL2) to map guest physical space into host physical space. Hardware chains both walks; guest software cannot bypass stage-2.GVA Guest Virtual Address. IPA Intermediate Physical Address. HPA Host Physical Address. EL1 guest kernel level. EL2 hypervisor level. TTBR0_EL1 guest translation table base register. VTTBR_EL2 stage-2 translation table base register.Four ways a memory access can fault, and what the kernel does
From a user program, every page fault looks the same: a load that stalled. Inside the kernel they take four different paths. A minor fault hooks up a page that is already in memory. A major fault has to read from disk. A copy-on-write fault makes a private copy of a shared page. A file-backed fault loads through the page cache.
fork(), parent and child share anonymous pages as read-only. The first write faults on a writable VMA (Virtual Memory Area) with a read-only PTE. The handler allocates a fresh page, copies data, marks PTE writable, and retries the instruction.How a device moves bytes without the CPU copying them
The CPU writes a small note in memory describing the work, then pokes the device once to say "go". The device reads memory directly, does the transfer, and fires an interrupt when it is done. This panel walks through that handoff and the cache flush that has to happen so the CPU and device do not see different bytes.
HT = Half Transfer, TC = Transfer Complete, TE = Transfer Error, FE = FIFO Error. These are status bits the interrupt handler checks before it decides whether to queue the next buffer, retry, or reset the engine.dma_sync_single_* calls. Depending on platform this may be uncached or hardware-snoop-coherent, but software treats it as device-visible immediately. This is common for descriptor rings and doorbell shadow data. Why 4 KiB appears in examples: demos often use page-sized chunks because they are easy to reason about, but descriptor length is not fixed to 4 KiB. For streaming buffers, software must hand off ownership with dma_sync_single_for_device() before DMA and dma_sync_single_for_cpu() after DMA.Virtual addresses: how the kernel turns a pointer into real memory
The CPU and any device that touches memory both go through a translation step. The CPU goes through the MMU. Devices go through the IOMMU. Both lookups end up pointing to a physical address in main memory. Under a hypervisor the chain runs twice.
L0[47:39], L1[38:30], L2[29:21], L3[20:12]. Each level has 512 entries. off[11:0] selects the byte inside the final 4 KiB page.TTBR0_EL1 points to the stage-1 root table for CPU accesses in the current address space. For device DMA, the SMMU context descriptor selects the device stage-1 table, then VTTBR_EL2 anchors stage-2.{ASID, VPN}, so entries from different processes can coexist safely.Why two threads see writes in different orders
The flag handshake shows how a release/acquire pair fixes a torn read. The store-buffer test goes further: both threads write one variable and read the other, and without a fence both reads can come back as zero on a real CPU, even though that seems impossible from the source.
Four ways threads coordinate, side by side
The four scenarios sit side by side: a spinlock that burns CPU, a mutex that parks a waiting thread, a priority-inversion case where a high-priority thread waits on a low one, and a lock-free compare-and-swap loop. The point is to see when each one is the right tool. Condition variables get their own panel.
LDXR reads the lock word and marks this core's exclusive attempt. STXR stores only if no competing write happened (0=success, 1=retry). DMB ISH (Data Memory Barrier, Inner Shareable) publishes prior shared-memory writes before lock handoff.How a single byte travels from a wire into a kernel buffer
Each bus accepts one byte of payload and shows the receiver turning the waveform back into data the driver can read. The view follows the byte through edge sampling, the shift register, the framing check, the FIFO, the interrupt, and the final write into memory.
How a camera frame becomes pixels in a user program
Light becomes electrons in the sensor. The sensor packages raw pixels and sends them over the camera bus. The image-signal processor turns those into an RGB frame, the device writes it into memory by itself, and the driver hands the buffer back to the calling program. The same shape applies to most capture devices.
SOF (Start Of Frame) and checks it at EOF (End Of Frame). The driver uses this to detect dropped/reordered frames, match DMA completions to the right buffer index, and keep VIDIOC_DQBUF in capture order.VIDIOC_DQBUF. If userspace holds buffers too long, queue depth shrinks and frames may drop with V4L2_BUF_FLAG_ERROR.How the kernel discovers a piece of hardware and binds a driver to it
On many systems the hardware does not announce itself. A small description file lists every device by name and address, and the kernel walks that list, matches each entry to a driver, and runs the driver's probe function. The three views show one step of that process at a time.
Selected node
Probe log
DTS node → C driver
Driver C source
Camera overlay (live)
DTS → FDT → unflatten → probe
unflattens it into a node tree, walks compatible strings, matches each node to a driver via of_device_id, and calls the driver's probe(). Probes that need clocks or regulators that are not yet ready return -EPROBE_DEFER and are retried.From power-on to a login prompt
Each lane is a piece of software that runs during boot. Each arrow is one piece handing control to the next. The bands group the steps by phase, making it easy to spot where firmware stops and the kernel takes over.